Learning to maximize reward rate: a model based on semi-Markov decision processes
نویسندگان
چکیده
منابع مشابه
Learning to maximize reward rate: a model based on semi-Markov decision processes
WHEN ANIMALS HAVE TO MAKE A NUMBER OF DECISIONS DURING A LIMITED TIME INTERVAL, THEY FACE A FUNDAMENTAL PROBLEM: how much time they should spend on each decision in order to achieve the maximum possible total outcome. Deliberating more on one decision usually leads to more outcome but less time will remain for other decisions. In the framework of sequential sampling models, the question is how ...
متن کاملSemi-markov Decision Processes
Considered are infinite horizon semi-Markov decision processes (SMDPs) with finite state and action spaces. Total expected discounted reward and long-run average expected reward optimality criteria are reviewed. Solution methodology for each criterion is given, constraints and variance sensitivity are also discussed.
متن کاملSemi-Markov Decision Processes
The previous chapter dealt with the discrete-time Markov decision model. In this model, decisions can be made only at fixed epochs t = 0, 1, . . . . However, in many stochastic control problems the times between the decision epochs are not constant but random. A possible tool for analysing such problems is the semiMarkov decision model. In Section 7.1 we discuss the basic elements of this model...
متن کاملSolving Semi-Markov Decision Problems using Average Reward Reinforcement Learning
A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs (referred to, in general, as Markov decision problems or MDPs). However, the computational complexity of the classical MDP algorithms, such as value iteration and policy iteration, is prohibitive and can grow ...
متن کاملMarkov Decision Processes with Arbitrary Reward Processes
We consider a learning problem where the decision maker interacts with a standard Markov decision process, with the exception that the reward functions vary arbitrarily over time. We show that, against every possible realization of the reward process, the agent can perform as well—in hindsight—as every stationary policy. This generalizes the classical no-regret result for repeated games. Specif...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Frontiers in Neuroscience
سال: 2014
ISSN: 1662-453X
DOI: 10.3389/fnins.2014.00101